Faster Rates for Policy Learning

نویسندگان

  • Alexander Luedtke
  • Antoine Chambaz
  • Alexander R. Luedtke
چکیده

This article improves the existing proven rates of regret decay in optimal policy estimation. We give a margin-free result showing that the regret decay for estimating a within-class optimal policy is second-order for empirical risk minimizers over Donsker classes, with regret decaying at a faster rate than the standard error of an efficient estimator of the value of an optimal policy. We also give a result from the classification literature that shows that faster regret decay is possible via plug-in estimation provided a margin condition holds. Four examples are considered. In these examples, the regret is expressed in terms of either the mean value or the median value; the number of possible actions is either two or finitely many; and the sampling scheme is either independent and identically distributed or sequential, where the latter represents a contextual bandit sampling scheme.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Hybrid Machine Learning Method for Intrusion Detection

Data security is an important area of concern for every computer system owner. An intrusion detection system is a device or software application that monitors a network or systems for malicious activity or policy violations. Already various techniques of artificial intelligence have been used for intrusion detection. The main challenge in this area is the running speed of the available implemen...

متن کامل

Protecting infant industries: Canadian manufacturing and the national policy, 1870¬タモ1913 ¬リニ

Infant industry protection has been the cornerstone of a debate on tariff policy that extends at least from the eighteenth century to the current day. In contrast to traditional neo-classical models of international trade that imply net negative effects, industrial organization and learning-by-doing trade models describe how protective tariffs can encourage output expansion, productivity improv...

متن کامل

Learning and International Policy Diffusion – The Case of Corporate Tax Policy ∗

A recent empirical literature has arisen documenting the response of one nation’s policy choices, including tax, environmental, and labour policies, to those of others. This has been largely interpreted as evidence of competition, be it for mobile resources (like FDI, taxable book income, etc.) or yardstick. We present a third explanation based on learning. When countries’ tax choices reflect p...

متن کامل

Learning Curve and Industry Structure: Evidences from Iranian Manufacturing Industries

he empirical studies have shown that cost advantages can occur due to economies of scale and economies of learning. However, a few studies have attempted to distinguish between these two effects on reducing costs. This paper is the first attempt on recognizing the impact of learning on reducing the cost with distinguishing the effect of economies of scale in Iran. Therefore, this study aims to ...

متن کامل

Competitive-Cooperative-Concurrent Reinforcement Learning with Importance Sampling

The speed and performance of learning depend on the complexity of the learner. A simple learner with few parameters and no internal states can quickly obtain a reactive policy, but its performance is limited. A learner with many parameters and internal states may finally achieve high performance, but it may take enormous time for learning. Therefore, it is difficult to decide in advance which a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017